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ABSTRACT  %%?% 

This  study  is  an  extension  of  previous  statistically 
oriented  research  at  the  Naval  Postgraduate  School.   The 
method  of  Model  Output  Statistics  is  used  to  predict  open- 
ocean  visibility  employing  stepwise-selection,  multiple 
linear  regression.   The  visibility  predictand  is  specified 
categorically  with  comparisons  made  to  a  previous  probabil- 
istic approach.   Predictors  include  direct  and  derived 
model  output  parameters  provided  by  the  U.S.  Navy's  Fleet 
Numerical  Oceanography  Center  (FNOC),  Monterey,  California. 
About  18,000  North  Pacific  Ocean  (30°-60°N)  synoptic  ship 
reports  at  0000  GMT  from  June  1976  and  1977,  July  1979, 
and  August  1979  were  used  as  both  dependent  and  independent 
data  sets.   Visibility  equations  for  both  analysis-time 
and  24-  and  48-hr  prognostic  times  are  developed,  and  are 
verified  using  percent  correct,  Heidke  skill  score,  and 
bias.   Levels  of  skill  are  less  than  desirable   for  opera- 
tional use.   Important  predictor  parameters  are  found  to 
be  sensible  and  evaporative  heat  fluxes,  meridional  wind 
component,  sea-level  pressure,  air/sea  temperature  differ- 
ence, relative  humidity,  an  FNOC  fog  probability  parameter 
and  a  visibility  parameter  derived  from  a  marine  aerosol 

model.   Other  experiments  concerning  weighted  least  squares 
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predictand  transformations  and  R  deflation  are  briefly 

described. 
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I.   INTRODUCTION  AND  BACKGROUND 

Visibility  is  an  important  meteorological  variable  that 
can  have  a  significant  impact  on  the  safety  of  maritime 
operations.   Naval  activities  such  as  amphibious  assault, 
underway  replenishment   and  air  operations  can  be  greatly 
restricted  under  conditions  of  low  visibility.   Civilian 
operations  can  suffer  also.   In  most  cases  poor  visibility 
at  sea  is  due  to  the  occurrence  of  fog.   The  economic,  mili- 
tary  and  human  losses  associated  with  United  States  Naval 
Operations  attributable  to  fog  are  well  documented  by  Wheeler 
and  Leipper  (1974).   Thus  accurate  forecasts  of  fog,  or  more 
generally,  marine  visibility,  would  be  of  great  benefit  to 
the  military  and  civilian  communities. 

Earlier  research  into  this  problem  at  the  Naval  Post- 
graduate School  (NPS) ,  Monterey,  California,  using  statistical 
methods,  was  conducted  by  Van  Orman  and  Renard  (197  7)  ,  Quinn 
(1978),  and  Ouzts  and  Renard  (1979),  who  all  applied  regression 
techniques  to  forecast  the  occurrence  of  fog  with  some  degree 
of  skill.   Research  into  forecasting  visibility  directly,  but 
using  a  very  limited  set  of  parameters  and  data,  was  conducted 
by  Schramm  (1966) .   Further  work  by  Nelson  (1972)   used  a 
larger  data  set  and  investigated  new  parameters.   More  recently 
the  work  by  Aldinger  (1979)  continued  research  into  determining 
those  parameters  which  are  statistically  correlated  with  marine 
visibility.   In  addition,  using  a  probabilistic  approach, 
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Aldinger  derived  analysis-time  linear  regression  equations 
which  show  a  reasonable  degree  of  probabilistic  skill.   He 
also  expanded  the  evaluation  of  these  equations  to  categori- 
cal estimates  using  Threat  Score,  Heidke  Skill  Score   and 
percent  correct.   In  addition,  he  adapted  a  scoring  awards 
matrix  to  the  verification  which  enhances  the  skill  by  giving 
partial  credit  to  forecasts  that  are  close  to  the  observed 
category. 

This  study  continues  the  statistical  regression  work  on 
visibility  analysis/forecasting,  but  uses  a  categorical 
approach  rather  than  a  probabilistic  one.   New  predictor 
parameters  are  investigated  and  prognostic,  as  well  as 
analysis-time,  equations  are  derived.   In  addition,  more 
attention  is  given  to  interpreting  the  statistical  methods 
used. 
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II.   OBJECTIVES 

The  primary  objective  of  this  study  was  to  expand  on 
previous  NPS  visibility  research  using  numerical-model  output 
parameters  from  the  Fleet  Numerical  Oceanography  Center 
(FNOC) ,   Monterey,  California  to  diagnose  and  predict  marine 
visibility  over  the  open  ocean  by  statistical  means.   The 
method  of  model  output  statistics  (MOS)  (see  Glahn  and  Lowry, 
1972)  was  used  to  predict  visibility  categories  directly  as 
opposed  to  using  a  probabilistic  approach. 

Within  the  primary  objective,  more  specific  goals  to  be 
achieved  were  to: 

(1)  Develop  statistical  diagnostic  (analysis-time,  or  Tau 
J3f  hr)  and  prognostic  (forecast-time,  or  Tau  24  hr,  48  hr) 
visibility  equations  using  stepwise  multiple  linear  regression; 

(2)  test  several  types  of  categorical  schemes; 

(3)  test  various  forms  of  the  visibility   predictand 
in  the  regression  program; 

(4)  test  predictor  parameters  not  previously  used  in  NPS 
visibility  research; 

(5)  compare  the  categorical  approach  to  the  probabilistic 
approach  as  used  by  Aldinger  (1979) ; 

(6)  test  methods  of  regression  other  than  the  least- 
squares  linear  type. 


Formerly  called  the  "Fleet  Numerical  Weather  Central" 
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III.   DATA 

A.  AREA 

The  area  of  study  was  limited  to  a  region  of  the  North 
Pacific  Ocean  located  approximately  between  30°  and  60 °N  and 
from  145°E  to  130°W.   The  actual  area  was  restricted  in  size 
from  the  limits  mentioned  in  order  to  reduce  the  number  of 
land- influenced  grid  points  used  in  computing  derivatives 
applicable  at  marine  grid  locations.   Also,  this  was  done  to 
eliminate,  as  much  as  possible,  any  orographic  influences  on 
visibility.   The  study  area  is  shown  in  Figure  1  on  a  polar 
stereographic  projection,  the  grid  points  of  which  correspond 
to  those  of  the  standard  FNOC  63  x  63  grid  (with  a  mesh  size 
of  381  km  at  60 °N) .   The  entire  FNOC  grid  is  shown  in  Figure  2 
with  an  outlined  area  from  which  FNOC ' s  model  output  parameters 
were  extracted.   This  study  area  is  the  same  as  that  used 
for  recent  statistical  studies  of  marine  fog  and  visibility 
at  NPS. 

B.  SELECTION  OF  TIME  PERIOD 

Data  from  the  months  of  June,  July  and  August  only  were 
used  in  this  study.   The  frequency  of  fog  -  (and  thus  visibility) 
related  maritime  casualties  reaches  a  peak  during  the  Northern 
Hemisphere  summer  months  (Figure  3) .   Therefore,  this  period 
is  one  of  primary  operational  significance. 
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Figure  2.   Fleet  Numerical  Oceanography  Center's  63x63 
grid,  with  outline  of  North  Pacific  Ocean 
rectangular  grid  area  used  in  study. 
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Only  0000  GMT  synoptic  ship  report  data  were  used  as 
this  ensured  that  daylight  was  present  throughout  the  study 
area,  thus  allowing  more  accurate  visibility  observations 
than  if  nighttime  observations  were  included. 

Model  output  parameter  data  from  FNOC  were  taken  from 
0000  GMT  for  use  in  analysis-time  equations.  However,  in 
prognostic  equations  1200  GMT  parameters  also  were  used. 

Diagnostic  (Tau  0  hr)  equations  were  developed  from 
combined  June  1976  and  June  1977  data  using  analysis-time 
data  only.   In  addition,  equations  for  Tau  0,  24   and  4  8 
hrs  were  developed  from  July  1979  data  using  both  analysis- 
time  and  prognostic-time  parameters. 

C.   SYNOPTIC  WEATHER  REPORTS 

The  synoptic  weather  reports  used  in  this  study  were 

2 
provided  by  the  Naval  Oceanography  Command  Detachment   co- 
located  with  the  National  Climatic  Center  at  Asheville,  North 
Carolina. 

The  total  number  of  observations  available  in  the  area 
of  Figure  1  is  as  follows: 

June  1976  (Tau  0)      4277 

June  1977  (Tau  0)      5044 

July  1979  (Tau  0)      4079 

(Tau  24)     4095 

(Tau  48)     4102 


Formerly  called  the  "Naval  Weather  Service  Detachment" 
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August  1979  (Tau  0)     4727 

(Tau  24)    4520 

(Tau  48)    4421 
The  actual  number  of  cases  varied  slightly  from  the  numbers 
given  above  depending  on  experiments  being  performed. 

All  synoptic  reports  from  the  June  data  sets  were  put 
through  a  quality  control  check  by  Aldinger  (1979)  to 
ensure  a  certain  degree  of  compatability  among  present  weather 
and  visibility  codes,  in  conformance  with  the  Federal  Meteoro- 
logical Handbook  No.  2  (U.S.  Depts .  of  Commerce,  Defense, 
and  Transportation,  1969) .   All  data  sets  including  July  and 
August  1979  data  were  quality-control  checked  by  the  National 
Climatic  Center,  Asheville,  N.C. 

D.  INTERPOLATION  SCHEME 

All  model  output  parameters,  whose  positions  are  within 
the  FNOC  grid,  were  interpolated  to  the  ship  positions  from 
which  the  synoptic  observations  were  obtained.   The  interpo- 
lation method  used  is  a  natural  bicubic  spline  curvilinear 
scheme.   This  scheme  and  its  documentation  are  available  at 
the  NPS  W.R.  Church  Computer  Center  where  all  the  computer 
computations  for  this  study  were  accomplished. 

E.  PREDICTOR  PARAMETERS 

1.   Model  Output  Parameters  (MOP's) 

A  total  of  22  analysis-  and  prognostic-model  parameters 
were  provided  by  FNOC.   They  were  generated  from  the  Mass 
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Structure  Analysis  model,  the  Primitive  Equation  (P.E.) 
model,  and  the  Marine  Wind  model  [U.S.  Naval  Weather  Service, 
1975].   In  addition,  79  other  parameters  were  developed  from 
the  original  set.   Brief  descriptions  of  all  of  these 
parameters  are  listed  in  Appendix  A. 

2.  Climatology  Parameter 

The  only  climatology  factor  used  as  a  parameter  in 
this  study  is  the  fog  climatology  developed  by  the  National 
Climatic  Center  [Guttman,  1978] .   A  suitable  visibility  clima- 
tology was  not  available  at  the  time  of  this  study. 

3 .  Interactive  and  Modified  Parameters 
Interactive  parameters  were  formed  in  this  study  by 

using  the  product  of  two  different  parameters.   They  have 
been  used  to  account  for  possible  physical  interactions  between 
variables.   Other  parameters,  called  "modified",  are  simply 
the  square,  or  the  square  root,  of  an  MOP.   A  decision  as  to 
which  variables  to  combine  or  modify  out  of  an  almost  un- 
limited number  of  possibilities  is  a  difficult  task.   There- 
fore, four  of  the  parameters  chosen  here  were  taken  from  a 
previous  study  by  Ouzts  (1979) .   The  remainder  were  chosen 
by  combining  or  modifying  those  parameters  which  contributed 
significantly  to  explaining  the  variance  of  the  predictand, 

in  one  or  more  experiments  of  this  study. 

4 .  Binary  Parameters 

This  type  of  parameter  is  commonly  used  by  the 
Techniques  Development  Laboratory  of  the  National  Weather 
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Service,  Silver  Springs,  Maryland.   A  binary  parameter 
is  formed  from  an  MOP  by  choosing  one  or  more  critical  values 
of  that  MOP  which,  when  equaled  or  exceeded,  gives  the  binary 
a  value  of  one;  otherwise  the  binary  has  a  value  of  zero. 
Here  again,  a  seemingly  infinite  number  of  parameters  is 
possible,  but  the  set  of  binary  parameters  was  limited  to 
14  in  this  study. 

5 .   Beta  Visibility  Parameter 

The  information  for  the  computation  of  this  parameter 

3 
was  supplied  by  Dr.  A.  Goroch  of  the  Naval  Environmental 

Prediction  Research  Facility.   The  computation  uses  a  marine 

aerosol  model  developed  for  the  United  States  Navy  to  test 

electro-optical  system  performance. 

Apparently  no  formal  documentation  is  available  on 
the  development  of  this  model.   However,  Nounkester  (198  0) 
refers  to  this  model  and  states  that  it  was  developed  by 
modifying  an  empirical  model  proposed  by  Wells,  et  al.,  (1977) 
The  modifications  were  made  by  B.  Katz  of  the  Naval  Surface 
Weapons  Center,  White  Oak,  Maryland;  L.  Ruhnke  of  the  Naval 
Research  Laboratory,  Washington,  D.C.;  and  M.  Munn  of  the 
Lockheed  Research  Laboratory,  Palo  Alto,  California. 

The  aerosol  model  computes  extinction  coefficients  and 
ranges  at  various  wavelengths,  as  affected  by  molecular 
scattering  and  absorption,  aerosol  extinction  and  weather. 


3 
Personal  communication 
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Only  the  visual  range  was  of  interest  in  this  study,  so 
only  that  portion  of  the  model  was  used. 

As  input,  the  FNOC  model  output  surface  windspeed  and 
relative  humidity,  and  present  weather  code  were  supplied. 
Then,  a  parameterized  visibility  was  computed,  herein  called 
beta  visibility  (BVIS) .   Since  two  relative  humidity  parameters 
were  available,  RHR  and  RHX,  two  beta  visibility  parameters 
could  be  computed,  BVISR  and  BVISX. 

Because  the  present  weather  code  was  not  available 
at  prognostic  times,  beta  visibility  could  not  be  computed 
at  tau  24  and  tau  48.   However,  since  the  aerosol  extinction 
itself  was  expected  to  correlate  well  with  observed  visi- 
bility, a  modified  beta  visibility  parameter  was  formed  by 
simply  omitting  the  weather  code  input.   This  modified  beta 
visibility  (MBVIS)  could  then  be  used  at  prognostic  times. 
The  method  produced  a  less  accurate  parameter,  but  one  that 
still  correlated  well  with  observed  visibility.   The  methods 
used  for  computing  the  BVIS  and  MBVIS  parameters  are  given 
in  Appendix  B . 3 . 
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IV.   PROCEDURE 

A.   REGRESSION  SCHEME 

A  computer  program  for  stepwise  multiple  linear  regression 
using  the  method  of  least  squares  was  used  to  derive  the 
visibility  equations.   The  program  used  is  one  of  the  UCLA 
BMDP  series,  namely  BMDP2R  [UCLA,  1979]  . 

In  this  program  the  dependent  variable  (predictand)  is 
specified,  then  independent  variables  (predictors)  are  entered 
(forward  stepping)  or  removed  (backward  stepping)  based  on  a 
statistical  F-test  with  given  F-to-Enter  (4.0)  and  F-to- 
remove  (3.9).   The  first  predictor  selected  in  forward  stepping 
is  the  predictor  variable  with  the  highest  F-to-enter.   Suc- 
ceeding steps  enter  variables  in  the  same  manner.   At  each 
step  the  variables  already  entered  into  the  equation  are 
reevaluated  and  could  be  removed  by  backward  stepping  if  they 
fail  to  exceed  the  minimum  F-to-remove  value. 

If  a  variable  being  considered  for  entry  reflects  a  strong 
linear  combination  with  any  of  the  variables  already  entered, 
it  may  cause  computational  difficulties,  and  the  BMDP2R 
program  will  reject  it  if  its  tolerance  value  equals  or 
exceeds  0.01.   The  program  continues  stepping  until  all 
variables  are  used,  or  until  no  further  variables  meet  the 
F-to-enter  value.   A  further  definition  of  the  statistics 
used  is  included  in  Appendix  C. 
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Another  regression  routine  available  is  BMDP9R,  called 
All  Possible  Subsets  Regression.   Rather  than  performing 
a  screening  regression  as  in  BMDP2R  this  program  considers 

all  possible  combinations  of  predictor  variables  to  achieve 

2 
the  highest  possible  R  value  (explained  variance) .   This 

program  was  used  for  a  few  experiments.   Some  of  the  com- 

2 
puted  subsets  did  manage  to  attain  a  higher  R  value  than 

2 

that  achieved  by  screening  regression,  but  these  R  values 

were  only  marginally  higher  and  have  doubtful  significance. 

Thus,  the  results  achieved  by  this  method  did  not  justify 

the  excessive  computer  time  involved,  and  so  it  was  abandoned. 

B.   CATEGORICAL  APPROACH 

Previously  at  NPS ,  Aldinger  (1979)  developed  analysis- 
time  visibility  regression  equations  based  on  a  probability 
approach.   Equations  were  developed  to  estimate  the  probability 
of  occurrence  of  each  of  several  visibility  code  groupings. 
In  this  study  a  categorical  approach  was  used.   Several  schemes 
for  grouping  visibility  codes  into  different  categories  were 
used.   In  order  to  have  a  visibility  value  for  the  predictand 
the  midpoint  value  of  the  visibility  range  for  each  observed 
category  was  used.   For  example,  if  a  category  included  synop- 
tic codes  90-93  the  visibility  range  would  be  0-1  km,  and  the 
visibility  predictand  was  assigned  the  value  of  0.5  km.   An 
exception  to  this  rule  was  made  for  the  highest  visibility 
category.   Since  this  category  has  no  upper  limit,  several 
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arbitrary  visibility  values  were  assigned  to  the  predictand 
depending  on  the  categorical  scheme  involved.   A  list  of 
the  synoptic  visibility  codes  used  to  determine  the 
visibility  categories  can  be  found  in  the  Federal  Meteor- 
ological Handbook  No.  2  [U.S.  Depts .  of  Commerce,  Defense 
and  Transportation] . 

The  regression  equations  so  developed  yield  continuous 
visibility  values  (in  kilometers)  which  can  be  used 
directly,  or  perhaps  more  appropriately,  can  be  used  to 
specify  the  selected  category.   The  latter  method  is  used 
in  this  study  for  verification  purposes . 

Since  there  are  only  ten  reported  synoptic  visibility 
codes,  with  each  code  representing  a  range  of  visibility, 
the  maximum  number  of  defined  categories  is  limited  to  ten. 
Using  the  maximum  number  of  categories  allows  the  greatest 
visibility  lEsolution.  However,  there  is  some  inaccuracy 
involved  in  visibility  reporting  that  is  related  to  an  ob- 
server's ability  to  discriminate  between  different  visibility 
ranges.  Therefore,  categorical  schemes  were  developed  which 
combined  several  observed  codes  into  one  category.  This 
approach  provides  a  wider  visibility  range  for  each  category 
and  partly  compensates  for  observer  error.  It  is  reasoned 
that  an  observer  should  be  able  to  distinguish  between  a  few 
larger  visibility  ranges  better  than  a  larger  number  of  smaller 
visibility  ranges.   Of  course,  with  fewer  categories  some 
visibility  resolution  is  lost.  In  the  extreme  case,  a  scheme 
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with  only  one  category,  which  includes  all  visibility  values, 
would  not  be  affected  by  observer  error,  and  all  regression 
estimates  would  be  perfect.   However,  such  a  scheme  obviously 
would  be  useless.   Therefore,  some  tradeoff  between  accuracy 
and  resolution  should  be  made.   In  this  study  schemes  involving 
five  and  ten  categories  were  tested. 

Tau  0  equations  were  developed  for  all  categorical  schemes 
from  combined  June  1976  and  June  1977  data.   The  predictor 
parameters  considered  in  the  equations  are  listed  in  Appendix 
A,  part  1. 

Analysis-time  (Tau  =  0  hr)  and  prognostic  (Tau  =  24  and 
48  hr)  equations  were  developed  from  July  1979  data.   Prog- 
nostic equations  at  24  hr   and  48  hr  only  were  developed  so 
that  the  verification  times  would  correspond  to  0000  GMT. 
However,  MOP ' s  from  00,  12,  24,  36,  and  4  8  hr  were  used.   The 
parameter  list  used  to  develop  these  equations  is  located  in 
Appendix  A,  part   2. 

C.   EQUATION  TRUNCATION  AND  VERIFICATION 

The  BMDP2R  regression  routine  enters  a  new  variable  at 

2 

each  step,  increasing  the  R  value  each  time,  thus  fitting 

the  equation  better  to  the  dependent  data.   After  a  certain 

2 
number  of  steps,  however,  the  incremental  increase  in  R  per 

step  may  have  little  or  no  significance  when  the  equation  is 

applied  to  independent  data.   For  this  reason  it  was  decided 

to  truncate  each  equation  before  entering  a  variable  which 
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2 

does  not  increase  the  R  value  by  a  rounded  value  of  1%. 

In  general  this  produced  an  equation  with  four  to  six  varia- 
bles.  More  will  be  said  on  this  topic  later. 

Two  scoring  methods  were  used  to  describe  the  skill  of 
each  final  regression  equation.   These  two  methods  consist 
of  computing  the  percentage  of  correct  forecasts  and  Heidke 
Skill  score  for  each  equation.   The  formula  for  computing  these 
scores  is  given  in  Appendix  D.   The  continuous  visibility 
output  from  a  regression  equation  lies  within  the  visibility 
range  of  a  particular  category.   This  particular  category  is 
considered  to  be  the  one  estimated  by  the  regression  equation. 
The  number  of  times  each  category  is  thus  estimated   is  com- 
pared to  the  number  of  observations  of  each  category  for 
scoring  purposes. 

All  equations  were  verified  against  the  dependent  data 
from  which  they  were  derived.   In  addition,  all  five-category 
equations  were  verified  against  independent  data.   Equations 
developed  from  combined  June  1976  and  June  19  77  were  indepen- 
dently verified  using  July  1979  data,  and  equations  developed 
from  July  1979  data  were  verified  using  August  1979  data. 
Unfortunately,  the  lack  of  availability  of  MOP  fields  and 
observational  data  prevented  the  independent  verification  of 
June  equations  with  other  June  data,  and  July  equations  with 
other  July  data. 

Another  scoring  technique  applies  a  scoring  matrix 
developed  by  Aldinger  (197  9)  and  applied  to  the  five-category 
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scheme.   The  matrix  applies  weights  to  the  number  of  esti- 
mates of  each  category  in  order  to  give  some  credit  for 
nearly  correct  estimates.   This  matrix,  called  the  NPS  awards 
matrix,  is  further  described  in  Section  V.C.3. 

In  addition,  a  distribution  measure,  called  bias,  is 
calculated  for  each  category.   Bias  represents  the  ratio  of 
the  number  of  forecasts  to  the  number  of  observations  of  each 
category . 
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V.   EXPERIMENTS,  RESULTS,  DISCUSSION 

A.   CATEGORICAL  SCHEMES 

1.   Ten-Category  Scheme:   10CATA 

This  scheme  uses  ten  categories  of  the  predictand 
as  defined  below. 


Category 
Number 

Observed 
Visibility  Code 

Visibility 
Range  (km) 

Value  of 
Predictand  (km) 

I 

90 

< 

0.05 

0.025 

II 

91 

0.05 

to 

<  0.2 

0.125 

III 

92 

0.2 

to 

<  0.5 

0.35 

IV 

93 

0.5 

to 

<  1.0 

0.75 

V 

94 

1.0 

to 

<  2.0 

1.5 

VI 

95 

2.0 

to 

<  4.0 

3.0 

VII 

96 

4.0 

to 

<10.0 

7.0 

VIII 

97 

10.0 

to 

<20.0 

15.0 

IX 

98 

20.0 

to 

<50.0 

35.0 

X 

99 

> 

50.0 

75.0 

A  Tau  0  equation  was  developed  from  combined  June 
19  76  and  June  19  77  data  and  verified  on  the  dependent  data. 
All  values,  except  for  regression  coefficients  are  given  to 
two  decimal  places. 
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Coefficient 
-354.558 
+  1.346 
+  0.388 
+  0.358 
+  5.174 
+  1.380 
-   2.938 

R2  =  .25 


Predictor 

EHF 

BVISR 

PS 

SEHF1 

ASTDR 

VCMP1 


Dependent  Verification:    Percent  Correct  =  40 

Skill  Score     =  .13 

Category    I    II    III    IV   V   VI    VII    VIII    IX    X 
Bias       .03   .01   .01    .01  .07  .19   .56    1.60   1.46   .01 

The  scores  for  this  scheme  are  relatively  low.   The 
bias  values  indicate  that  the  highest  category  and  the  lowest 
six  categories  are  observed  far  more  often  than  selected  by 
the  regression  equation.   On  the  other  hand,  categories  VIII 
and  IX  were  selected  much  more  often  than  they  were  observed. 
2.   Ten-Category  Scheme:   10CATB 

It  was  felt  that  the  arbitrarily  selected  midpoint 
value  of  75.0  km  for  category  X  in  10CATA  was  too  high, 
thus  causing  a  poor  fit  of  data  in  the  regression  equation. 
Therefore,  this  category  was  changed  in  10CATB,  as  follows. 
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Category 
Number 


Observed 
Visibility  Code 


Visibility 
Range  (km) 


Value  of 
Predictand  (km) 


X 


99 


>  50 


50 


All  other  categories,  I  through  IX,  were  defined  the 
same  as  in  10CATA.   The  Tau  0  equations  was  developed  from 
combined  June  1976  and  June  1977  data  and  verified  with  the 
dependent  data. 


Coe 

if  ficient 

Predictor 

-303.043 

+ 

1.165 

EHF 

+ 

0.335 

BVISR 

+ 

0.308 

PS 

+ 

4.627 

SEHF1 

+ 

1.098 

ASTDR 

- 

2.609 

VCMP1 

2 

.28 

Dependent  Verification:   Percent  Correct  =  39 

Skill  Score     =  .13 

Category    I    II    III    IV   V   VI    VII    VIII    IX    X 
Bias       .03   .00   .01    .01  .05  .09   .54    1.83   1.36   .00 

This  equation  shows  some  improvement  over  the  10CATA 

2 
equation  in  R  value,  however   the  percent  correct  is  slightly 

lower  and  the  Heidke  skill  score  is  the  same. 
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3 .   Five-Category  Scheme:   5CAT 

Deriving  a  regression  equation  with  fewer  categories 
should  yield  better  results  due  to  partial  compensation  of 
observer  error.   In  this  case,  five  categories  are  used 
which  correspond  to  the  probabilistic  five-category  scheme 
of  Aldinger  (1979) . 

Category       Observed        Visibility     Value  of 
Number Visibility  Codes    Range  (km)    Predictand  (km) 

I  90,91,92  <  0.5  0.25 

II  93,94  0.5  to  <  2.0  1.25 

III  95,96  2.0  to  <10.0  6.0 

IV  97  10.0  to  <20.0  15.0 

V  98,99  >_20.0  35.0 

The  Tau  0  equation  was  developed  from  combined  June 
1976  and  June  1977  data,  and  verified  using  both  the  dependent 
June  data  and  independent  data  from  July  1979. 


Coefficient 

Predictor 

+272.710 

+ 

1.035 

EHF 

+ 

0.292 

BVISR 

+ 

0.277 

PS 

+ 

4.280 

SEHF1 

+ 

0.944 

ASTDR 

- 

0.223 

VCOMP 

2 

R   = 

.27 
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Dependent  Verification:   Percent  Correct  =  44 

Skill  Score     =  .17 
Category      I       II       III       IV      V 
Bias  .02     .02      .47       2.12     1.05 

Independent  Verification;   Percent  Correct  =  42 

Skill  Score     =  .17 
Category      I       II       III       IV      V 
Bias  .03     .02      .25       .87      .49 

It  is  to  be  noted  that  the  variables  selected  are  the 
same  as  those  selected  in  the  two  ten-category  schemes  with 
the  exception  that  in  this  scheme  VCOMP  was  selected  instead 
of  VCMPl.   The  5CAT  scheme  shows  an  increase  in  skill  score 
as  expected,  and  the  percent  correct  also  increased.   Bias 
values  here  are  not  much  better  than  those  for  10CATA  and 
10CATB  except  for  category  V  of  the  dependent  verification 
and  category  IV  of  the  independent  verification,  both  of  which 
show  values  approaching  unity. 

B.   REGRESSION  EQUATIONS 

The  ultimate  goal  is  to  forecast,  not  just  analyze,  visi- 
bility.  Therefore,  using  the  July  1979  data  set  and  a  new 
set  of  parameters  which  included  prognostic  predictors,  new 
equations  were  developed  using  the  5CAT  scheme.   First  a  new 
equation  for  Tau  0  was  derived,  then  forecast-interval  equa- 
tions for  Tau  24  and  Tau  4  8  were  developed.   The  parameter  set 
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used  for  these  equations  is  given  in  Appendix  A,  part  2. 
All  three  of  the  following  equations  were  verified  using  the 
dependent  data  and  also  verified  independently  with  data  from 
August  1979. 

1.   00-hr  Diagnostic  Equation:   5P00 

Coefficient         Predictor 

+10.137 

+  0.687  EHF  00 

+  0.488  BVISR 

-  9.018  FTER  00 


+  3.048  SEHF1  12 


R2  =  .30 


The  two-digit  number  after  some  of  the  predictor 
parameters  indicates  the  time  interval  from  which  the 
parameter  is  derived.   Those  predictors  without  such  a  number 
are  available  at  the  analysis  time  only. 

Dependent  Verification:   Percent  Correct  =  42 

Skill  Score     =  .18 
Category    I     II     III     IV     V 
Bias         .02    .02     .90      2.27    1.07 

Independent  Verification:   Percent  Correct  =  51 

Skill  Score     =  .21 
Category    I     II     III     IV     V 
Bias         .02    .02     .99      2.00    1.10 


33 


2  .         . 

The  R  value  and  verification  of  equation  5P00  is 

better  than  the  verification  of  the  5CAT  equation  due  to  the 
consideration  of  more  parameters  in  the  July  1979  data  set 
than  in  the  combined  June  1976  and  June  19  79  data  sets.   The 
bias  values  are  not  much  different,  except  for  category  III 
which  shows  improvement.   It  may  be  noted  that  all  selected 
parameters  but  one  are  from  the  analysis  time  which  seems 
consistent  with  the  nature  of  the  Tau  0  equation. 

An  interesting  fact  is  that  the  independent  verifica- 
tion of  5P00  yields  better  values  than  the  dependent  verifica- 
tion.  This  is,  in  part,  due  to  the  fact  that  the  independent 
data  contains  a  higher  percentage  of  observations  in  those 
high  visibility  categories  which  the  equation  estimates  best. 
In  addition  the  dependent  data  comes  from  a  large  enough 
sample  of  synoptic  conditions  that  the  regression  equation 
could  score  higher  when  applied  to  independent  data,  which 
by  chance  includes  a  larger  number  of  those  synoptic  situations 
best  handled  by  the  equation. 

2 .   24-hr  Prognostic  Equation:   5P24 

Coefficient         Predictor 

+  0.085 

+1.077  EHF  24 

+  0.440  BVISR 

+  0.00  2  RHRX 

-  7.418  FTER  24 

R2  =  .30 
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Bias         .10 

.08     .56 

Independent 

Verification 

Category    I 

II     III 

Bias         .04 

.07     .61 

Dependent  Verification;   Percent  Correct  =  42 

Skill  Score     =  .16 
Category    I     II     III     IV     V 

2.26    1.16 
Percent  Correct  =  .52 
Skill  Score     =  .20 
IV     V 
1.92    1.17 

2 
There  is  a  deterioration  in  R  value  when  5P24  is 

compared  to  5P00,  as  one  might  expect.   The  percent  correct 

is  similar  for  both  equations,  but  the  Heidke  skill  score  for 

5P24  is  slightly  less  than  for  5P00.   Here  again,  as  in  5P00, 

the  independent  verification  is  better  than  the  dependent 

verification. 

It  is  to  be  noted  that   variables  from  Tau  2  4  have 

entered  the  5P24  equation,  which  is  consistent  with  the 

nature  of  a  Tau  24  equation. 

3.   48-hr  Prognostic  Equation:   4P48 

Coefficient         Predictor 

-  4.160 

+  0.390  EHF  36 

+  0.555  BVISR 

-12.631  FTER  48 

+  0.633  EHF  00 

+  0.003  RHRSQ 


-  0.160  MBVIS  48 


9 

r  =   .27 
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Dependent  Verification:   Percent  Correct  =  42 

Skill  Score     =  .13 
Category     I      II      III      IV     V 
Bias         .01    .01     .29      2.08    1.40 

Independent  Verification:   Percent  Correct  =  52 

Skill  Score     =  .16 
Category    I     II     III     IV     V 
Bias         .00    .01     .20      1.72    1.32 

2 

Here  the  R  value  has  deteriorated  somewhat  from  the 

5P00  and  5P24  cases.   The  percent  correct  is  the  same  for 
equations  at  all  three  time  periods,  but  the  Heidke  skill 
score  in  5P48  is  worse  than  that  for  5P24  and  5P00.   Overall 
the  bias  values  for  5P48  are  worse  than  for  both  5P00  and  5P24 
Once  again  the  independent  verification  is  better  than  the 
dependent  verification. 

It  is  to  be  noted  that  two  Tau  48  hr  predictors  have 
entered  the  equation.   However,  there  is  also  one  TAu  36  hr 
predictor  and  three  Tau  00  hr  predictors.   The  predictor 
BVISR  shows  up  in  5P48  as  well  as  in  5P00  and  5P24.   BVISR, 
which  itself  is  a  parameterized  visibility,  can  be  considered 
an  indicator  of  the  persistence  of  marine  visbility  regimes 
through  4  8  hours. 

C.   PROBABILISTIC  VS.  CATEGORICAL  APPROACH 

Aldinger  (1979)  used  the  5CAT  scheme  outline  previously 
and  developed  regression  equations  for  the  probability  of 
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occurrence  of  each  category.   Then,  using  the  notion  of 
threshold  probability,  the  most-likely  category  was  determined. 
For  comparison,  an  equation  was  developed  by  the  categorical 
method  of  this  study  considering  only  those  predictor  parameters 
used  by  Aldinger .   All  equations  were  derived  from  the  com- 
bined June  1976  and  June  1977  data  and  were  verified  dependently 

1.   Probabilistic  Equations  [Aldinger,  1979] 

Category  Equation 

I  VISPROB  =  366.262  -  1.647  SEHF  +  .289  RHR 

-  .369  PS  +  .401  VCOMP 
R2  =  .13 

II  VISPROB  =  738.837  -  .264  EHF  -  .746  PS 

+  .555  RHR  -  1.689  SEHF 
R2  =  .21 

III  VISPROB  =  266.075  +  .303  WWW  -  .256  PS 

+  .247  RHR  +  .313  RHX 
R2  =  .05 

IV  VISPROB  =  -278.669  +  .365  SEHF  -  .643  VCOMP 

+  .431  WWW  +  .333  PS 
R2  =  .09 

V  VISPROB  =  -693.510  +  3.633  EHF  +  .767  PS 

-  .709  VCOMP  -  .352  RHR 
R2  =  .21 
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VISPROB  is  the  probability  of  occurrence  of  the  category 
for  which  the  equation  is  derived. 

Dependent  Verification:   Percent  Correct  =  32 

Skill  Score     =  .13 
Category     I       II       III       IV      V 
Bias         .04     1.53     1.10      2.08     0.40 

2 .  Categorical  Equation 

Only  one  categorical  equation  was  derived  whose 
visibility  value  (VIS)  determines  the  visibility  category 
by  selecting  that  category  to  which  VIS  belongs. 

VIS  =  -302.35  +  .175  EHF  +  .339  PS  -  .254  RHR 

+  .730  SEHF 
R2  =  .24 

Dependent  Verification:   Percent  Correct  =  43 

Skill  Score     =  .14 
Category     I       II       III       IV      V 
Bias         .02     .01      .28       2.08     1.13 

Comparing  the  two  approaches  shows  that  the  cate- 
gorical approach  yields  a  higher  percent  correct  and  a 
slightly  higher  skill  score.   However,  except  for  category 
V,  the  biases  are  worse  for  the  categorical  scheme.  As  might 
be  expected  both  methods  use  similar  predictor  parameters. 
SEHF,  RHR,  PS   and  EHF  are  common  to  both. 

3.  NPS  Awards  Matrix 

Aldinger  (1979)  developed  an  awards  matrix  which 
when  applied  to  the  verification  matrix  (Appendix  E)  of  a 
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5-category  scheme  gives  some  credit  to  near  successes.   The 
Techniques  Development  Laboratory  (TDL)  of  the  National 
Weather  Service  has  also  used  an  awards  matrix ,  but  of  a 
different  nature,  which  does  not  give  full  credit  to  all 
correct  visibility  estimates  [National  Weather  Service,  1973] 
The  NPS  awards  matrix  does  give  full  credit  to  all  correct 
estimates.   All  quantities  of  a  verification  matrix  are 
multiplied  by  the  corresponding  percentages  in  the  awards 
matrix  shown  below. 


OBSERVED 

Estimated 

Category 

CATEGORY 

I 

II 

III 

IV 

V 

I 

100 

80 

0 

0 

0 

II 

80 

100 

25 

0 

0 

III 

0 

25 

100 

25 

0 

IV 

0 

0 

25 

100 

75 

V 

0 

0 

0 

75 

100 

The  verification  results,  after  applying  the  awards  matrix, 
are  as  follows: 

Probabilistic  Approach:   Percent  Correct  =  60 

Skill  Score      =  .27 
Categorical  Approach:    Percent  Correct  =  63 

Skill  Score     =  .12 

In  both  cases  percent  correct  increases  markedly. 
However,  for  the  probabilistic  approach  the  skill  score  doubles, 
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while  for  the  categorical  approach  the  skill  score  decreases. 
This  shows  that  the  probabilistic  approach  forecasts  near 
successes  much  better  than  the  categorical  approach,  thus 
enhancing  its  usefulness. 

D.   PREDICTAND  TRANSFORMATIONS 

Generally  the  relationship  between  an  atmospheric  pre- 
dictand  and  the  predictors  is  not  linear.   This  can  lead  to 
less  than  desirable  results  when  multiple  linear  regression 
is  used.   Non-linear  regression  may  be  used  to  overcome  this 
problem,  but  the  increased  computational  time  involved  usually 
precludes  its  use.   Another  method  used  to  solve  the  non- 
linear problem  is  to  transform  the  predictand  to  a  form  which 
then  relates  in  a  more  linear  manner  to  the  predictors. 

Using  a  limited  number  of  parameters  several  transforms 

were  tested  on  the  10CATA  scheme,  using  July  1976  and  July 

2 
1977  data.   The  relative  values  of  R   produced  using  each 

transform  are  shown  below. 


Predictand 

Ri 

VISIBILITY 

(VIS) 

.230 

Log1Q(VIS) 

.243 

1/VIS 

.037 

(1/VIS)2 

.011 

vis1/2 

.272 

vis1/3 

.273 

vis1/4 

.267 
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2 

It  can  be  seen  that  the  R  value  for  several  of  the 

2 

transformed  predictands  was  higher  than  the  R  value  for 

the  non-transformed  visibility  predictand,  though  the 
increase  was  not  large. 

However,  the  real  test  is  how  well  an  equation  with  a 
transformed  predictand  verifies.   So  the  equation  derived 

with  the  cube  root  of  visibility  as  the  predictand,  which 

2 
yielded  the  highest  R  value,  was  scored  against  the  equation 

with  the  non-transformed  predictand. 

Predictand  =  visibility. 

Dependent  Verification :   Percent  Correct  =  39 


Skill  Score     =  .14 


1/3 
Predictand  =  visibility 


Dependent  Verification :   Percent  Correct  =  27 

Skill  Score     =  -.01 

The  results  show  that  the  transformed  predictand  yielded 

worse  scores  than  the  unmodified  visibility  predictand. 

2 
This  is  a  surprising  result  in  view  of  the  relative  R   value. 

It  may,  in  part,  be  explained  by  the  fact  that  there  was  an 
uneven  distribution  of  visibility  observations  between  cate- 
gories, with  a  heavy  weighting  toward  higher  visibility  cate- 
gories.  Time  limitations,  however,  did  not  permit  examining 
this  further,  and  all  other  research  was  conducted  using  the 
non-transformed  predictand. 
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E.   WEIGHTED  LEAST  SQUARES 

In  this  study  the  data  distribution  is  such  that  most 
observations  occurred  in  the  higher  categories,  in  particu- 
lar category  98.   The  result  of  this  is  a  regression  equation 
that  fits  the  higher  visibility  categories  better  than  the 
lower  visibility  categories.   As  a  result,  low  visibilities 
are  poorly  estimated. 

The  technique  of  weighted  least  squares  was  applied  in 
an  attempt  to  alleviate  this  problem.   The  goal  was  to  weight 
more  heavily  the  lower  category  cases  in  relation  to  those 
in  the  higher  categories  so  that  the  resultant  equation  would 
increase  skill  in  estimating  poor  visibilities. 

The  BMDP  programs  [UCLA,  1979]  allow  case  weights  to  be 
applied.   The  weighted  least  squares  technique  minimizes 

Wj  ^  (YJ  ~  Yj} 
where, 

w .  is  the  case  weight  for  case  j 

y .  is  the  observed  visibility  for  case  j 

y .  is  the  regression  estimate  for  case  j . 

Normally  the  weight  for  each  case  should  be  inversely 
proportional  to  the  variance  [Daniel,  1971] r  but  any  number 
of  weighting  techniques  may  be  tried.   In  this  study  two 
sets  of  case  weights  were  tried  and  applied  to  the  schem  of 
IOC ATA. 
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The  first  scheme  (WLS1)  weighted  each  case  with  a  weight 
equal  to  the  inverse  of  the  predictand  value,  as  follows. 

For  cases  of         The  predictand        And  the  case 
observed  code        value  (km)  is         weight  (w . )  is 

90  .025  1/.025 

91  .125  1/.125 

92  .35  1/.35 

93  .75  1/.75 

94  1.5  1/1.5 

95  3.0  1/3.0 

96  7.0  1/7.0 

97  15.0  1/15.0 

98  35.0  1/35.0 

99  75.0  1/75.0 

The  resultant  equation  derived  from  combined  June  1976 
and  June  1977  data  (not  given  here)  was  verified  dependently 
with  the  following  results. 

R2  =  .09 

Percent  Correct  =  7 

Skill  Score     =  -.01 

2 

Obviously,  this  is  a  poor  weighting  system.   The  R  value 

is  very  low  and  the  scores  are  predictably  poor. 

For  the  second  scheme  (WLS2)  a  more  reasonable  set  of 
weights  was  used.   The  variance  was  computed  for  each  cate- 
gory from  the  unweighted  equation  of  10CATA.   Then  the  weight 
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for  each  case  in  a  particular  observed  category  was  set  to 
the  inverse  of  the  square  root  of  the  variance  of  the  observed 
category . 

For  Cases  of        The  Predictand       And  the  case 
Observed  Code       value  (km)  is        weight  (w . )  is 

90  .025  .0052 

91  .125  .0603 

92  .35  .0661 

93  .75  .0615 

94  1.5  .0702 

95  3.0  .0700 

96  7.0  .0754 

97  15.0  .0941 

98  35.0  .0925 

99  75.0  .0242 

(Each  code  group  corresponds  to  a  category  in  the  10CATA 
scheme. ) 

The  case  weights  shown  here  are  somewhat  contrary  to  what 
might  be  expected.   It  would  seem  that  the  variances  of  the 
higher  categories  would  be  larger  than  those  of  the  smaller 
categories,  if  for  no  other  reason  than  the  fact  that  the 
visibility  ranges  of  the  higher  categories  are  greater.   If 
this  were  true  the  case  weights  for  the  higher  categories 
would  be  smaller  than  for  the  lower  categories.   However, 
the  weights  shown  here  generally  increase  with  an  increase  in 
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category ,  with  the  exception  of  category  X  (code  99).   This 
result  is  due  to  the  fact  that  the  regression  equation  esti- 
mates those  categories  best  which  contain  the  highest  number 
of  observations,  namely  the  categories  containing  codes  97 
and  98. 

A  comparison  of  dependent  verification  between  the  equa- 
tions of  10CATA  and  WLS2  shows  very  little  difference. 

2 

Scheme R Percent  Correct Skill  Score 

10CATA         .25  40  .13 

WLS2  .23  40  .12 

F.   DEFLATION  OF  R2 

According  to  theory,  if  a  regression  equation  perfectly 

fits  the  data  from  which  it  was  developed  the  explained 

2 

variance,  R  ,  should  equal  1.0.   However,  it  appears  that 

due  to  the  nature  of  the  categorical  schemes  in  this  study 

2 
a  limit  was  placed  on  the  maximum  R   that  it  was  possible  to 

achieve.   This  particular  limit  is  related  to  the  fact  that 

each  predictand  value  was  assumed  to  be  the  midpoint  value 

of  the  observed  category,  thus  providing  discrete  visibility 

values.   However,  the  regression  equation  gives  continuous 

visibility  values  which  are  then  used  with  the  assigned  pre- 

2 
dictand  values  to  determine  R  . 

2 

In  one  experiment,  to  demonstrate  the  deflation  of  R  , 

a  regression  equation  of  the  form  of  10CATA  scheme  was 
developed.   Then  using  the  dependent  data,  the  equation  was 
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used  to  compute  visibility  values,  V.. 

Symbolically:   V.  =  A-,    +  B,x,  .  +  C,x„.  +  ... 

J  *  l     1     1  li     1  2i 

where, 

V    =  visibility 

x's   =   independent  predictors. 

These  V.  values  were  used  as  substitutes  for  the  original 
visibility  observations.   Next,  using  these  V.  values,  a  new 
predictand,  V. ',  was  derived  by  re-setting  the  V.  value  to 
the  midpoint  of  the  category  to  which  V.  belonged,  giving 
V . ' .   Finally,  a  second  regression  equation  was  developed 
using  the  V . '  as  predictand  values  to  yield  an  equation  of 
the  form 


V."   =   A2  +  B2xu  +  C2x2i  +  ...  . 

It  can  be  seen  that  if  the  continuous  values,  V.,  had  been 

i 

used  as  the  predictand  the  second  regression  equation  would 

2 

be  identical  to  the  first  one  and  have  an  R  value  of  1.0. 

However,  because  the  predictand,  V.',  used  to  develop  the 
second  equation  has  discrete  values  as  defined  by  the  cate- 
gorical scheme,  the  second  equation  is  not  identical  to  the 

2 
first;  and  the  R  value  is  approximately  0.7,  using  V. '  as 

the  observed  values. 

2 

It  is  believed  that  the  R  value  of  0.7  rather  than  1.0 

is  the  maximum  value  achievable  in  the  10CATA  scheme  with  a 
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perfect  equation,  due  to  the  method  of  defining  the  pre- 

dictand  used  in  this  study.   The  other  categorical  schemes, 

2 
of  course,  have  a  similar  R   limit. 

2 
The  drop  of  R  from  1.0  to  0.7  can  be  demonstrated 

by  schematic  graphs.   Assuming  that  the  observed  visibility 

can  be  expressed  perfectly  by  a  regression  equation,  for 

2 
which  R  =1.0,  then  the  graph  below  is  the  result.   As 

the  continuous  regression-estimated  visibility  increases 

the  observed  visibility  increases  continuously  also. 


Visibility  from 
Regression  Equation 
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However,  the  observed  visibility  is  not  given  as  a 
continuous  variable.   Rather  the  visibility  observations 
are  given  as  ranges  or  categories,  and  the  visibility 
predictand  is  defined  as  the  midpoint  of  the  observed 
range,  which  is  demonstrated  schematically  below. 
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I    II    III   iv    v 
Visibility  from  Regression  Equation 

The  schematic  above  shows  a  step  function  relationship 
which  indicates  that  as  the  continuous  regression- 
estimated  visibility  increases  within  each  categorical 
visibility  range  (given  by  roman  numerals)  the  observed 
visibility  remains  constant. 

The  regression-estimated  visibility  values  have  not 
changed  from  the  first  schematic  to  the  second  but  the 
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verifying  "observed"  values  have  changed  from  continuous 

to  discrete  values.   All  observed  values  below  a  categorical 

midpoint  value  have  been  increased,  and  values  lying  above 

a  midpoint  value  have  been  decreased. 

2 

The  deterioration  of  R  which  results  from  the  second 

case  can  be  seen  by  noting  the  deviation  of  values  along 
the  discrete  observed  visibility  step  function  from  the 
continuous  observed  visibility  line  as  shown  below. 


deviation 

(discrete  -continuous) 


Visibility  from  Regression  Equation 


In  another  experiment,  an  attempt  was  made  to  compute 

2 
the  R  value  for  the  10CATA  equation  without  the  hindrance 

of  the  problem  just  described.   The  BMDP  programs  compute 

2 

R  using  the  continuous  regression-produced  visibility 
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values  and  the  discrete  observed  values.   A  separate  program 

2 
was  developed  to  compute  R  by  first  re-setting  the  continuous 

regression  values  of  10  CATA  to  the  midpoint  values  of  the 

categories  to  which  they  belong.   Then,  using  the  discrete 

2 

predictand  values,  a  new  R  was  computed.   In  this  case  dis- 
crete values  are  used  for  both  the  observations  and  the 

2 
regression  estimates.   The  R  value  computed  in  this  way  is 

.31  as  compared  to  .25  computed  by  the  BMDP  programs.   All 

2 
R  values  previously  shown  in  this  study  were  computed  by 

the  method  used  in  the  BMDP  programs . 

2 
The  maximum  R  value  of  approximately  0.7  as  found  by 

2 
experiment  for  the  10CATA  scheme  may  be  compared  to  the  R 

value  of  .31  which  the  10CATA  equation  yielded.   The  differ- 

2 
ence  between  the  two  R  values  of  approximately  40%  can  now 

be  attributed  to  errors  in  the  observations  and  numerical 

MOP's  and  the  non-linear  relationship  between  visibility  and 

associated  meteorological  parameters. 


G.   DISTRIBUTION  PROBLEM 

The  distribution  of  observations  among  synoptic  codes  for 
the  combined  June  1976  and  June  1977  data  set  is  shown  below. 
It  can  be  noted  that  the  highest  three  categories  contain 
66%  of  the  observations,  and  the  highest  four  categories 
contain  79%  of  the  observations.   The  observation  distribu- 
tions are  similar  for  the  July  1979  and  August  1979  data 
sets . 
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Code 

Number  of 

Percent  of 

Group 

Observations 

Total  observations 

90 

75 

0.8 

91 

233 

2.6 

92 

400 

4.4 

93 

740 

8.1 

94 

166 

1.8 

95 

327 

3.6 

96 

1125 

12.3 

97 

1911 

21.0 

98 

3642 

39.9 

99 

495 

5.4 

This  fact  tended  to  tune  all  the  regression  equations  to 
the  high  categories,  such  that  high  categories  were  estimated 
relatively  well  by  the  regression  equations   and  low  visi- 
bility categories  were  estimated  poorly.   This  is  somewhat 
contrary  to  what  is  desired,  since  forecasts  of  low  visibility 
are  very  important  operationally. 

The  probabilistic  approach  does  not  have  a  similar  dis- 
tribution problem,  since  one  regression  equation  is  developed 
for  each  visibility  category  and  depends  only  on  the  observa- 
tions of  a  single  category. 

H.   BETA  VISIBILITY 

The  beta  visibility  was  previously  described.   Its  compu- 
tation is  given  in  Appendix  B.3.   Beta  visibility  is  not  only 
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a  parameter  for  use  in  visibility  regression  equations   but 
itself  yields  a  value  of  visibility  which  may  be  of  use. 
This  section  attempts  to  quantify  its  usefulness. 

The  BMDP  programs  were  used  to  compute  a  correlation 
coefficient  between  the  predictand  and  the  various  forms  of 
the  beta  visibility  parameter.   It  is  to  be  noted  that  the 
visibility  predictand  is  not  a  directly  observed  visibility 
value,  but  rather  it  is  the  midpoint  value  of  an  observed 
visibility  range.   The  correlation  coefficients,  R,  between 
the  various  forms  of  the  beta  visibility  parameter  and  the 
visibility  predictand  of  the  5CAT  scheme  are  given  in  the 
following  table.   A  comparison  of  maximum,  minimum  and  mean 
values  is  also  given.   These  statistics  were  derived  using 
the  July  1979  data  set. 


Comparative  Statistics  and  Correlation  to  the  Visibility 
Predictand  (VIS)  at  Tau  0  hr 

Maximum  (km)   Minimum  (km)   Mean  (km)     R 


VIS  (Tau  0)        35.0  0.25         19.2       1.00 

BVISR  46.9  0.56         14.3       0.43 

BVISX  51.9  0.79         19.9       0.09 


Comparative  Statistics  and  Correlation  to  the  Visibility 
Predictand   (VIS)  at  Tau  0+24  hr 

Maximum  (km)   Minimum  (km)   Mean  (km)     R 


VIS  (Tau 

24) 

35.0 

0.25 

19.0 

1.00 

BVISR 

48.7 

0.51 

14.3 

0.31 

BVISX 

51.9 

0.79 

20.0 

0.10 

MBVIS  24 

44.4 

1.68 

17.2 

0.05 
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Comparative  Statistics  and  Correlation  to  the  Visibility 
Predictand  (VIS)  at  Tau  0+48  hr 

Maximum  (km)   Minimum  (km)   Mean  (km)      R 


VIS  (Tau 

48) 

35.0 

0.25 

18.8 

1.00 

BVISR 

52.1 

0.42 

14.3 

0.24 

BVISX 

51.9 

0.62 

20.0 

0.06 

MBVIS  48 

50.1 

2.14 

15.4 

0.02 

It  should  be  noted  that  in  the  table  the  analysis- 
time  parameters  BVISR  and  BVISX  are  compared  to  the 
predictand  at  all  three  time  periods.   The  table  shows 
that  the  maximum,  minimum  and  mean  values  of  all  the  beta 
visibility  parameters  are  similar  to  the  corresponding 
values  of  the  visibility  predictand  at  each  time  period. 
BVISR  shows  a  higher  correlation  to  the  predictand  than 
BVISX  at  all  time  periods,  though  the  correlation  of  both 
parameters  to  the  predictand  worsens  with  time.   Both  the 
analysis-time  parameters  BVISR  and  BVISX  show  higher 
correlation  to  the  predictand  at  Tau  24  hr  than  the 
prognostic-time  parameter  MBVIS  24.   The  same  is  true  at 
Tau  4  8  hr  when  comparing  BVISR  and  BVISX  to  MBVIS  48. 

The  following  clarifies  the  reason  for  the  slight 
differences  in  maximum,  minimum  and  mean  values  for  the 
same  parameter  at  different  time  periods.   The  Tau  24  hr 
data  includes  values  from  the  first  day  of  August  (i.e. 
up  to  24  hrs  after  the  last  day  of  the  July  data  set) ,  and 
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omits  values  from  the  first  day  in  July.   In  like  manner, 
the  Tau  4  8  hr  data  includes  the  first  two  days  of  August 
and  omits  the  first  two  days  of  July.   Thus  the  data  set  for 
each  time  period  is  slightly  different. 

In  addition,  a  skill  score  was  computed  for  BVISR  and 
BVISX  by  determining  the  code  group  to  which  the  computed 
beta  visibility  belonged,  and  comparing  that  to  the  observed 
code  groups  in  the  combined  June  1976  and  June  1977  data. 

Heidke  Skill  Score Percent  Correct 

BVISR  0.10  33 

BVISX  0.07  31 

It  can  be  concluded  by  these  results  that  although  beta 
visibility  is  a  useful  predictor  parameter  for  regression 
analysis,  it  has  quite  limited  skill  when  used  to  estimate 
visibility  by  itself. 

I.   COMMENTS  ON  EXPLAINED  VARIANCE 

2 
The  total  explained  variance,  R  ,  of  a  multiple  linear 

regression  equation  is  a  measure  of  how  well  the  dependent 
variable  (predictand)  can  be  approximated  by  a  linear  com- 
bination of  independent  variables  (predictors) .   The  higher 

2 

the  value  of  R  ,  the  better  the  approximation  is.   A  perfect 

2 
linear  relationship  results  in  an  R  value  of  1.0.   However, 

2 
it  should  be  noted  that  R   indicates  only  how  well  a  given 

equation  will  estimate  a  given  predictand  if  one  uses  the 
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method  of  least  squares.   This  method  results  in  a  regression 
equation  which  minimizes  the  value  of  the  sum  of  squares 

of  the  estimate  errors  (estimate  error  =  estimated  value  minus 

2 
observed  value) .   An  equation  with  a  given  R  will  not 

necessarily  provide  a  better  estimate  of  the  predictand  than 

2 
an  equation  with  a  smaller  R  when  evaluated  by  some  method 

other  than  least  squares.   An  entirely  different  situation 

may  occur  if  one  applies  the  derived  regression  equation  to 

independent  data.   Though  the  original  equation  may  be  a 

good  fitting  equation  for  the  dependent  data  (by  the  least 

squares  criterion)  it  may  be  a  poor  fit  for  the  independent 

data,  especially  if  the  number  of  cases  is  small.   In  this 

study  the  sample  size  of  over  4000  cases  is  large  enough  that 

a  drastic  drop  in  estimation  ability  is  not  to  be  expected 

when  independent  data  are  applied,  however  some  deterioration 

was  encountered. 

Also,  as  additional  predictors  are  entered  into  an  equa- 

2 
tion  by  the  stepwise  process  the  R  value  will  increase,  but 

2 

an  equation  with  fewer  predictors  and  a  lower  R  may,  in  fact, 

provide  a  better  estimate  when  applied  to  independent  data. 
This  is  so,  since  as  more  variables  enter  into  an  equation, 
it  becomes  more  likely  that  the  equation  will  reflect  relation- 
ships unique  to  the  dependent  data.   Thus  extra  variables 
may  degrade  an  equation  when  scored  on  independent  data  [Air 
Weather  Service,  1977] .   Of  course,  the  application  of  inde- 
pendent data  may  also  show  an  improvement  in  scores  due  to 
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the  peculiarities  of  a  particular  data  set.   However,  some 
form  of  truncation  method  should  be  used  to  limit  the  number 
of  variables  in  an  equation  such  as  was  done  in  this  study. 
An  experiment  to  demonstrate  the  relationship  of  score 
to  number  of  predictors  in  the  equation  was  performed,  using 
the  regression  results  of  the  SCAT  scheme.   Truncating  the 
SCAT  scheme  at  different  steps  yielded  the  following. 


Dependent  Data 

Independent 

Data 

Step 

2 

R              Skill  Score 

%  Correct 

Skill  Score       % 

Correct 

1 

.166                .123 

40.4 

.128 

39.5 

2 

.219                 .149 

42.7 

.173 

41.8 

3 

.245                .153 

44.0 

.179 

42.7 

4 

.256                 .151 

43.2 

.178 

43.2 

5 

.262                 .167 

43.8 

.179 

42.7 

6 

.269                 .174 

44.0 

.165 

41.9 

7 

.272                 .166 

44.4 

.156 

41.2 

8 

.275                 .174 

44.0 

.163 

40.9 

- 

It  can  be  seen  that  after  a  certain  point  the  direct 

2 

relationship  between  R  and  skill  becomes  obscure.   In 

this  study  the  equation  for  the  5CAT  scheme  as  described 

in  the  text  was  truncated  after  the  sixth  step,  for  at 

2 
the  seventh  step  the  R  failed  to  increase  by  a  rounded 

value  of  1%. 
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It  is  encouraging  to  note  that  the  results  above  show 
that  percent  correct  and  skill  score  do  not  substantially 
decrease  when  independent  data  is  applied  compared  to  when 
dependent  data  is  applied.   In  fact,  the  skill  score  is 
relatively  better  in  the  former  instance  for  the  first 
five  steps. 

J.   DISCUSSION  OF  ERRORS 

It  is  believed  by  the  author  that  the  techniques  used 
in  this  study  would  yield  equations  of  high  operational 
usefulness  if  it  were  not  for  various  unavoidable  errors. 
Linear  regression  assumes,  for  example,  that  all  predictand 
values  used  are  errorless.   This  is  far  from  true  here. 
Observer  error  in  estimating  visibility  at  sea  is  relatively 
high,  due  mostly  to  a  dearth  of  visibility  markers  at  sea 
and  also  due  to  the  fact  that  many  ships  transmitting 
synoptic  reports  may  have  observers  with  little  or  no 
observational  training  and/or  experience. 

Errors  also  enter  into  the  Model  Output  Parameters, 
which  are  only  as  good  as  the  numerical  models  from  which 
they  are  generated,  analyses  being  better  than  prognosis. 
The  method  used  to  interpolate  the  MOP ' s  to  the  synoptic 
ship  positions  also  adds  error  to  the  scheme. 
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VI.   CONCLUSIONS  AND  RECOMMENDATIONS 

The  categorical  approach  used  in  this  study  yielded 
visibility  equations  which  have  comparable  skill  both  at 
analysis  and  prognostic  times  which  is  a  promising  result. 
However,  the  actual  skill  of  the  equations  is  relatively 
poor  and  not  operationally  useful  at  this  time.   The 
reason  for  this  is  believed  to  lie  inherent  in  the  errors 
of  visibility  observations,  the  non-linear  relationship 
between  the  predictand  and  the  predictors,  and  the 
numerically  generated  MOP ' s .   The  future  promises  much 
improvement  due  to  new  statistical  techniques,  improved 
numerical  models   and  the  identification  of  more  air/ 
ocean  parameters  with  a  known  relation  to  visibility. 

The  comparison  of  the  probabilistic  to  the  categorical 
approach  indicates  that  the  probabilistic  approach  holds 
more  promise,  at  least  partly  due  to  the  fact  that  the 
categorical  approach  is  hindered  by  the  uneven  distribution 
of  observations.   The  probabilistic  approach  seems  to 
estimate  near  successes  better  than  the  categorical 
approach. 

Parameters  found  to  be  most  highly  related  to  visibility 
in  the  regression  equations  are:   evaporative  heat  flux, 
beta  visibility,  sea  level  pressure,  sensible  plus 
evaporative  heat  flux,  air/sea  temperature  difference, 
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meridional  component  of  the  wind,  relative  humidity 
parameters   and  FNOC ' s  fog  probability  parameter. 

The  following  recommendations  are  offered  for  future 
research: 

1.  Test  new  parameters  in  relation  to  visibility, 
such  as  some  type  of  visibility  persistence  parameter, 
more  interactive,  modified  and  binary  parameters,  and  a 
climatological  parameter  now  being  developed  for  the 
North  Pacific  by  the  National  Climatic  Center. 

2.  Investigate  further  the  techniques  of  weighted 
least  squares  and  transformation  of  the  predictand  to 
relate  more  closely  to  the  non-linear  nature  of  the 
problem. 

3.  Stratify  the  data  with  respect  to  critical  values 
of  geography  and  to  various  MOP ' s . 

4.  Investigate  the  use  of  discriminant  analysis  to 
estimate  visibility. 

5.  Stress  the  probabilistic  approach  over  the 
categorical  approach,  and  in  particular,  expand  the 
work  of  Aldinger  [1979]  to  include  additional  parameters 
and  prognostic  equations. 
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APPENDIX  A 
PREDICTOR  PARAMETER  DESCRIPTIONS 

Part  1.   This  part  consists  of  all  predictor  parameters 
considered  for  use  in  the  analysis-time  equations 
developed  from  the  combined  June  1976  and  June  1977  data 
set. 


NOTES : 


[*] 


[-] 


Denotes  those  predictor  parameters  that 
repeatedly  were  selected  early  by  the  stepwise 
regression  thereby  implying  their  relatively 
strong  relationship  with  visibility. 

Denotes  those  predictor  parameters  that  only 
occasionally  or  never  were  selected  early  by 
the  stepwise  regression,  but  may  be  useful  in 
future  studies. 

Denotes  those  predictor  paramters  that  seemed 
to  have  little  or  no  relation  to  visibility  in 
this  study. 


SYMBOL 


DESCRIPTIVE  NAME 


UNITS 


A.   Analysis  Parameters  (FNOC  Mass  Structure  Model) 

PS        Sea-level  Pressure  [**]  (mb) 

TAIR      Surface  Air  Temperature  [*]  (°C) 

EAIR      Surface  Vapor  Pressure  [*]  (mb) 

T925       925  mb  Air  Temperature  [*]  (°C) 

TSEA      Sea-Surface  Temperature  [ *]  (°C) 
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B.   Prognostic  Parameters  (FNOC  Primitive  Equation  Model) 

TX        Surface  Air  Temperature  [*]  (°C) 

Derived  from  surface  air  and  potential 
temperatures,  boundary  layer  depth, 
upper-level  winds  extrapolated  to 
surface,  air  density,  drag  coefficient, 
gustiness  factor   and  empirical  constants. 

EX        Surface  Vapor  Pressure  [*]  (mb) 

Derived  from  model's  mixing  ratio 

SOLARAD   Solar  Radiation  [*]  (gcal/ 

Calculated  absorption  of  incoming         cm2/hr) 
short-wave  (solar)  radiation, 
(postive  downward) 

EHF       Evaporative  Heat  Flux  [**]  (gcal/ 

Derived  using  air  density,  drag  cm2/hr) 

coefficient   extrapolated  winds, 
and  mixing  ratios. 

SHF        Sensible  Heat  Flux  [*]  (gcal/ 

Recovered  from  SHF  =  SEHF-EHF.  cm2/hr) 

Originally  derived  by  FNOC  using 
drag  coefficient,  extrapolated  winds, 
surface  air  temperature,  TX, 
density   and  constants. 

SEHF       Sensible  Plus  Evaporative  Heat  Flux  [**]   (gcal/ 
SEHF  =  SHF+EHF  cm2/hr) 

THF       Total  Heat  Flux  [*]  (gcal/ 

THF  =  SEHF-SOLARAD+LW,  cm2/hr) 

where  LW  is  the  heating  due  to  long- 
wave (terrestrial)  radiation. 


C.   Marine  Wind  Model  (FNOC) 


WWW      Marine  Wind  Speed  [*]  (kt) 

(DDWW)     Marine  Wind  Direction  (deg/10) 

This  variable  was  not  used  as  a 
predictor  parameter,  but  rather 
to  derive  other  parameters. 


D.   Derived  Parameters 

UCOMP      Zonal  Wind  Component  [*]  (m/sec) 

UCOMP  =  -WWW  sin  (DDWW- 10) 
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VCOMP     Meridional  Wind  Component  [**] 
VCOMP  =  -VVWW  COS(DDWW-IO) 

CAPU       I  Directional  Wind  Component  [*] 
CAPU  =  -UCOMP  •  sin (LNGA) 
-VCOMP  •  cos(LNGA) 
[Haltiner,  1971] ,  where 
LNGA  =  -10  -  (I, J  point  longitude). 

CAPV      J  Directional  Wind  Component  [*] 
CAPV  =  VCOMP  •  cos (LNGA) 

-VCOMP  •  sin (LNGA) 
[Haltiner,  1971] ,  where 
LNGA  =  -10  -  (I, J  point  longitude). 

THETAX     Potential  Temperature  X  [-] 
Derived  using  PS,  TX. 

THETAR    Potential  Temperature  R  [-] 
Derived  using  PS,  TAIR. 

STABX     Stability  X  [-] 

Derived  using  [THETAX  - 
(THETA  from  T925) ] / (PS-925] . 

STABR      Stability  R  [-] 

Derived  using  [THETAR  - 
(THETA  from  T925 )]/ (PS-925) . 

ASTDX     Air-Sea  Temperature  Difference  X  [**] 
ASTDX  =  TX-TSEA 

ASTDR     Air-Sea  Temperature  Difference  R  [**] 
ASTDR  =  TAIR-TSEA. 

ADTSEA     Advection  of  TSEA  [*] 
See  Appendix  B.l. 

ADTX       Advection  of  TX  [*] 
See  Appendix  B.l. 

ADTAIR     Advection  of  TAIR  [-] 
See  Appendix  B.l. 

AASTDX     Advection  of  ASTDX  [-] 
See  Appendix  B.l. 

AASTDR     Advection  of  ASTDR  [*] 
See  Appendix  B.l. 


(m/sec) 
(m/sec) 


(m/sec) 


(°K) 

(°K) 
(°K/mb) 

(°K/mb) 

(°C) 

(°C) 

(°C/hr) 

(°C/hr) 

(°C/hr) 

(°C/hr) 

(°C/hr) 
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RHR 


RHX 


Relative  Humidity  R  [**] 
See  Appendix  B.2. 

Relative  Humidity  X  [**] 
See  Appendix  B.2. 


(%) 


(%) 


E.   Interactive  and  Modified  Parameters 

RHRX  =  RHR  •  RHX  [**] 

RVCOMP  =  RHR  •  VCOMP  [-] 

RHRPS  =  RHR  •  PS  [-] 

RASTDX  =  RHR  •  ASTDX  [**] 

RSEHF  =  RHR  ■  SEHF  [-] 

PDSQ  =  (PS-1014.8)2  [-] 

PSRHX  =  PS  •  RHX  [-] 

PSSEHF  =  PS  •  SEHF  [-] 

PASTDX  =  PS  •  ASTDX  [*] 

PSVCMP  =  PS  '  VCOMP  [-] 

VSEHF  =  VCOMP  •  SEHF  [-] 

EHFADT  =  EHF  •  ADTAIR  [-] 

ESEHF  =  EHF  ■  SEHF 

EXEAIR  =  EX  •  EAIR  [-] 

SEVCMP  =  SEHF  •  VCOMP  [-] 

SEADTX  =  SEHF  •  ASTDX  [-] 

SERHX  =  SEHF  •  RHX  [-] 

ASTDRX  =  ASTDR  •  ASTDX  [*] 

UVCOMP  =  UCOMP  •  VCOMP  [*] 

CAPUV  =  CAPU  •  CAPV  [*] 

TARSEA  =  TAIR  *  TSEA  [-] 

TXAIR  =  TX  •  TAIR  [-] 
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SEHFSQ  =  SEHF  •  SEHF  [-] 

EHFSQ  =  EHF  ■  EHF  [-] 

RHRSQ  =  RHR  •  RHR  [**] 

RHXSQ  =  RHX  •  RHX  [*] 

VCMPSQ  =  VCOMP  •  VCOMP  [-] 

CAPUSQ  =  CAPU  •  CAPU  [*] 

TSEASQ  =  TSEA  •  TSEA  [-] 

ASDXSQ  =  ASTDX  *  ASTDX  [**] 

ASDRSQ  =  ASTDR  •  ASTDR  [*] 

ADSESQ  =  ADTSEA  •  ADTSEA  [-] 

PSSQ  =  PS  •  PS  [-] 

SREHF  Square  root  of  EHF  [*] 

SRPS  Square  root  of  PS  [*] 

SRASTR  Square  root  of  ASTDR  [-] 

SRASTX  Square  root  of  ASTDRX  [-] 

SRSEHF  Square  root  of  SEHF  [*] 

SRRHR  Square  root  of  RHR  [-] 

SRRHX  Square  root  of  RHX  [-] 

SRCAPU  Square  root  of  CAPU  [-] 

SRTSEA  Square  root  of  TSEA  [-] 

SRVCMP  Square  root  of  VCOMP  [-] 

SRASEA  Square  root  of  ADTSEA  [*] 


F.   Binary  Parameters 


EHF1 


fif  EHF  < 1.75  or  EHF  >  8.75;  EHFl 
\if  1.75  £  EHF  <_   8.75;  EHFl  =1.0 


=  0.0  [-] 
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EHF2 

& 

EHF3 

US 

PS1 

{!< 

PS2 

{;< 

RHR1 

tif 

RHR2 

es 

SEHF1 

ftl 

ASDX1 

ti 

ASDR1 

& 

VCMP1 

ft 

UCMP1 

{" 

STABX1 

ft! 

STABR1 

ft! 

EHF  <  3.34;  EHF2  =0.0     [*] 

EHF  >  3.34;  EHF2  =1.0 

EHF  <  0.0;  EHF3  =0.0      [-] 

EHF  >  0.0;  EHF3  =1.0 


PS  <  1000  or  PS  >  1030;  PS1 
1000  <  PS  <  1030;  PS1  =  1.0 


=  0.0 


PS  <  1014.8;  PS2  =  0.0 
PS  >  1014.8;  PS2  =1.0 

RHR  <  60;  RHRl  =0.0 
RHR  ^60;  RHRl  =1.0 

RHR  <  83;  RHR2  =0.0 
RHR  ^83;  RHR2  =1.0 

SEHF  <  0.0;  SEHFl  =0.0 
SEHF  >  0.0;  SEHFl  =1.0 

ASTDX  <  0.0;  ASDXl  =0.0 
ASTDX  >  0.0;  ASDXl  =1.0 

ASTDR  <  0.0;  ASDR1  =0.0 
ASTDR  >  0.0;  ASDRl  =1.0 


[-] 

[-] 

[-] 

[**] 

[-] 

[-] 

[**] 

[-] 


VCOMP  <  0.0;  VCMP1  =0.0 

VCOMP  >  0.0;  VCMP1  =1.0 

UCOMP  <  0.0;  UCMP1  =0.0 

UCOMP  >  0.0;  UCMP1  =1.0 

STABX  <  0.0;  STABXl  =  0.0  [-] 

STABX  >  0.0;  STABXl  =1.0 

STABR  <  0.0;  STABRl  =  0.0  [-] 

STABR  >  0.0;  STABRl  =1.0 


[-] 


G.   Other  Parameters 
FTER 


FNOC  Fog  Probability  Parameter  [**] 


BVISR     Beta  Visibility  Parameter  R  [**] 
See  Appendix  B,3. 

BVISX     Beta  Visibility  Parameter  X  [*] 
See  Appendix  B,3. 


(%) 
(km) 

(km) 
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Part  2.   This  part  consists  of  all  predictor  parameters 
considered  for  use  in  the  analysis-time  and  forecast- 
interval  equations  developed  from  the  July  1979  data. 
In  this  list  some  parameters  not  found  useful  in  the 
June  regression  runs  were  eliminated,  but  additional 
parameters  which  were  available  for  the  July  data  set 
were  added. 

A.   Predictors  used  to  develop  equations  both  from  June 
and  from  July  data  (described  in  Part  1) 

(1)  Parameters  available  f or  Tau  00,  12,  24,  36 
and  43  hr 

PS  T925  TX 

EX  EHF  SHF 

SEHF  THF  WWW 

UCOMP  VCOMP  RHX 

EHF 2  SEHF1  VCMPl 

FTER  UVCOMP 

(2)  Parameters  available  for  Tau  00  hr  only 
TAIR               EAIR  TSEA 
ASTDX              ASTDR  RHR 
ASTDRX             ASDXSQ            RASTDX 
RHRX               RHRSQ  BVISR 
BVISX 
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B.   Additional  variables  available  in  the  July  1979 
data  set 


SYMBOL 


DESCRIPTIVE  NAME 


UNITS 


CLIMO     National  Climatic  Center  (%/100) 

Fog  Frequency  Climatology  [*] 

SSANOM    Sea  Surface  Temperature  Anomaly  [*]       (°C) 
Available  at  Tau  00  hr 

U925       U  Wind  component  at  925  mb  [*]  (kt) 

Available  at  Tau  00,  12,  24,  36,  48  hr 

V925       V  Wind  component  at  925  mb  [*]  (kt) 

Available  at  Tau  00,  12,  24,  36,  48  hr 

E925       Vapor  pressure  at  925  mb  [*]  (mb) 

Available  at  Tau  12,24,36,48  hr 

GGTHTA    Front  Location  Parameter  [*]  (°K/ 

Available  at  Tau  00,  12,  24,  36,  48  hr 

NCLOUD     Total  Cloud  Cover  [*] 

Available  at  Tau  00,  12,  24,  36,  48  hr 

MBVIS      Modified  beta  visibility  [**] 
See  Appendix  B.3 
Available  at  Tau  12,  24,  36,  48  hr 

RASTDR     =  RHR  •  ASTDR  [*]  (°C  %) 

Available  at  Tau  00  hr 

H510       1000  mb  -  500  mb  [*]  (cm) 

D-value  thickness 
Available  at  Tau  00,  12,  24,  36,  48  hr 


(10  0  km)  ) 
(tenths) 

(km) 
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APPENDIX  B 
MISCELLANEOUS  PARAMETER  FORMULATIONS 

1 .   Advection  Parameters 

All  advection  parameters  use  the  following  general 
formulation. 

For  the  advection  of  a  quantity  (Q)  the  formula 
ADQ  =  -V  •  7Q,  was  used  in  the  finite  difference  form: 


RMAP 

"■O  -  "  TBT  lCAPU-<2i+i  -  Qi-i'j  +    capv-(qj+i  "  qj-i>i]' 


where   RMAP  =  (1  +  sin  60)/(l  +  sin  (latitude)) 
and     DM   =  [2 •  (6 . 37 • 106) •  (1  +  sin  60)1/31.205 
(31.205  =  number  of  grid  mesh  lengths,  pole  to  equator, 
on  the  FNOC  I, J  grid) . 


2.   Relative  Humidity  Parameters 

The  thermodynamic  equation  for  calculation  of 
saturation  vapor  pressure,  known  as  the  Clausius-Clapeyron 
equation  is  given  as 


de  7 

=   L(T)/RT    .  (1) 


e_   dT 


where 


R  =  specific  gas  constant  for  water  vapor 
(0.461  joule  g"1  °K~1) 
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T  =  temperature  (°K) 

L(T)  =  latent  heat  of  vaporization  of  water 

(joule  g   ) 
e   =  saturation  vapor  pressure. 


This  describes  the  behavior  of  e   as  a  function  of  T, 

s  ' 

assuming  water  vapor  to  be  an  ideal  gas.   It  cannot  be 
integrated  exactly  to  give  e   as  a  function  of  T,  since 
L(T)  is  not  known  to  sufficient  accuracy  at  more  than  a 
few  temperatures  [Weinreb,  1971] . 

The  Goff/Gratch  formula  (Eq.  2)  is  an  approximate 
solution  of  Eq.  (1)  considering  the  deviations  from  a 
perfect  gas  based  on  modern  experimental  data  [List,  1963] 


logl0  es   =   -7.90298 (Ts/T-1)  +  5.02808  loglQ (T  /T)    (2) 
-1.2816  x  i0-7(1011.334(l-T/Ts)  .  J, 

+  8.1328  x  lO^dO-3-49149^-1'  -  1) 


+  log, n  e 
^10   ws 


where 


T   =  steam  point  temperature  (373.16°K) 

T   =  absolute  (thermodynamic)  temperature  (°K) 

e   =  saturation  vapor  pressure  over  a  plane  surface 

of  pure  ordinary  liquid  water  (mb) 
e    =  saturation  pressure  of  pure  ordinary  liquid 
water  at  steam  point  pressure  (mb) . 
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Two  saturation  vapor  pressures  were  calculated  for 
each  grid  point  using  (a)  the  analysis-model  field, 
giving  ESAIR,  and  (b)  the  prognostic-model  field,  giving 
ESX.   Then  relative  humidity  parameters  were  calculated 
as  follows: 


RHR  =  §§!r  •  10° 


and        RHX   =  =£r?    •  100. 


3 .   Beta  Visibility  Parameter 

The  computation  of  this  parameter  starts  with  the 
production  of  an  extinction  coefficient,  3/  which  is  a 
function  of  windspeed  and  relative  humidity. 

3  =  F  (WWW)  -F  (RHR  or  RHX) 

where  WWW  =  surface  windspeed  (m/sec)  and 

RHR  or  RHX  =  relative  humidity, 
and 

F(x)  =  A,  +  x(A2  +  x(A   +  x(A4  +  x  (A   +  Agx) ) ) ) . 

If  the  relative  humidity  input  has  a  value  greater  than 
99.5  then  it  is  set  equal  to  99.5. 
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The  coefficients  are  as  follows: 

For  WWW  <  7  m/sec 

WWW  RHR  or  RHX 

A,      0.8065629  -0.4072407  x  101 

A2      0.4852030  x  10*1  0.3865717 

A3      0.5359734  x  10~2  -0.1405736  x  lo"1 

A.      0.0  0.2496362  x  10_3 

Ar      0.0  -0.216801  x  10~5 

5 

A.      0.0  0.7388672  x  10~8 

6 


For  WWW  >_  7  m/sec 

WWW  RHR  or  RHX 

-0.6135706  x  101 

0.583962 
-0.214833  x  10_1 

0.3777016  x  10"3 
-0.328404  x  10~5 

0.1120986  x  10"7 

Next,  a  new  extinction  coefficient  is  computed  as, 
$  n   =  3  +  S   where  S  is  given  as  follows 

S  Present  Weather  Code 

0.0  <50 

0.35  50-59 

0.2  60,61,80 

0.6  62,63,81 

1.19  64,65,82 


Al 

-0.8504248 

X 

101 

A2 

0.3782149 

X 

101 

A3 

-0.6052896 

A4 

0.4835776 

X 

10"1 

A5 

-0.1915719 

X 

lo'2 

A6 

0.3078907 

X 

10"4 
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The  scheme  does  not  apply  if  weather  codes  other  than 
those  listed  above  are  observed.   The  weather  codes  are 
defined  in  the  Federal  Meteorological  Handbook  No.  2 
[U.S.  Departments  of  Commerce,  Defense,  and  Transportation, 
1969] . 

Next,  beta  visibility  is  computed  by 


BVISR  =  ' 91    ,   using  RHR,  and 


6TC 


BVISX  =  |^-    ,   using  RHX. 
PTQT 


The  modified  beta  visibility  for  use  with  prognostic  times 
is  computed  without  the  weather  code  input  by  using  the 
formula 


3  91 
MBVIS  =  =-^=- 


and  here  RHX  only  is  used  for  the  relative  humidity  input 
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APPENDIX  C 
STATISTICS 


2 
1.   The  coefficient  of  part  determination,  R  ,  may  be 

interpreted  as  the  proportion  of  the  variance  of  the 

predictand  that  is  explained  by  the  regression  equation 

2 

The  computation  of  R   follows  iHill,  1979] . 


Y.      =   observed  value  of  the  dependent  variable  for 

case  i. 
Y.      =   regression-specified  value  for  case  i 
Y      =   mean  of  the  dependent  variable 
(Y.-Y.)   =   residual  for  case  i,  also  called  forecast 

error 

A  2 
I    (Y.-Y)    =   sum  of  squares  about  the  regression  line 

i 

r        —  2 

I     (Y.-Y)    =   sum  of  squares  of  deviations  about  the  mean 
i 


R      =   correlation  coefficient  between  Y.  and  Y. 

l      l 


.2 


=   proportion  of  the  variance  of  Y.  that  is 


"explained"  by  using  Y.,  or 

I  (Y.-Y)2  -  KY.-Y.)2 
RZ   =   
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2.   The  F-to-Enter  criterion  used  to  enter  variables  in 
the  stepwise  regression  procedure  is  given  as  follows 
[Hill,  1979]. 

For  each  independent  variable,  X,  ,  that  is  not  in 
the  equation  at  step  (j  +  1) ,  ( j  variables  have  already 
entered  the  equation) ; 

F-to-Enter  = 


2 
J  (residuals  at  step  j)   -  £  (residuals  at  step  (j+1) 
i  i  2 


with  xv  in  the  equation) 


2 
I     (residuals  at  step  (j+1)  with  X,  in  the  equation)  / 

1  (n-j-2) 

n  =  number  of  cases 

The  F-to-Enter  statistic  is  generally  a  measure  of 
the  importance  of  one  variable  relative  to  another. 

3.   The  goal  in  regression  is  to  find  the  line,  Y,  such 

*  2 
that  the  sum  of  the  squared  residuals  [J  (Y.-Y)  ]  is 

minimized  [Hill,  1979].   For  the  line  to  be  useful,  it 

is  required  that  the  deviations  between  the  observations 

and  the  line  be  smaller  than  the  deviations  between  the 

line  and  the  overall  mean.   Therefore,  the  quantity 
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[I    (Yi-Y)2]  -  I    (Yi-Yi)2]   should  be  large  or  one  could 


77,    2 


say  a  good  line  has   £ (Y-Y)   small  compared  to   J(Y.-Y) 


The  regression  line  is   Y  =  fc>n  +  b,X,  or  generally, 

Y.  =  bA  +  I    b.X. . 
1     0    L.       J    31 


4.   When  an  independent  variable  has  a  low  tolerance  it 
should  not  be  included  in  a  regression  equation  because 
its  value  can  be  expressed  fairly  well  using  a  linear 
combination  of  variables  already  entered  in  the  equation 
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A  variable  with  a  low  tolerance  does  not  add  significantly 
to  the  accuracy  of  a  regression  equation  and  may  cause 
numerical  and  statistical  accuracy  problems  [Hill,  1979] . 
The  tolerance  is  computed  by 


TOLERANCE  =  1  -  R^  % 

k* 


where  R  is  the  multiple  correlation  coefficient  of  the 
entering  variable,  X   with  the  set  of  independent 
variables  already  in  the  equation,  I .   If  the  computed 
value  of  tolerance  is  less  than  a  preselected  limit 
value,  a  prospective  predictor  cannot  be  selected  for 
the  regression  equation  as  it  is  too  highly  correlated 
with  the  predictors  already  selected. 
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APPENDIX  D 
VERIFICATION  SCORE  FORMULAE 

1.   The  two  scores,  percent  correct  and  Heidke  skill 
score,  use  a  verification  matrix  as  follows:   (A  2x2 
matrix  is  used  as  an  example,  but  the  technique  may  be 
applied  to  any  size  matrix.) 

estimated 


> 

u 

0) 
CO 

o 


A 

B 

i 

C 

D 

k 

J 

I 

1 
k 
J 


A+B 
C+D 
A+C 
B+D 


(a)   Percent  Correct   = 


A+D 


A+B+C+D 


x  100 


number  of  correct  estimates 
total  number  of  estimates 


(b)   Heidke  skill  score  = 


(A+D)  -  EXP 
(A+B+C+D)  -  EXP 


number  of  correct  estimates  - 

correct  number  expected  due  to  chance 

total  number  of  estimates  - 

correct  number  expected  due  to  chance 


EXP 


(i  •  j)  +  (k  •  I) 
A+B+C+D 


77 


2.   Bias  Calculation 


Bias  in  estimating  a  given  category  = 


number  of  estimates  of  a  given  category 
number  of  observations  of  same  category 


J       I 
such  as   +  or  j- 
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APPENDIX  E 
SELECTED  VERIFICATION  MATRICES 

The  following  verification  matrices  show  the  number 
of  observations  in  relation  to  the  number  of  regression 
estimates  for  each  visibility  category.   The  top  number 
in  each  block  is  derived  from  dependent  data  and  the 
bottom  number  from  independent  data.   Row  and  column 
totals  are  given  in  the  margins. 

1.  Verification  Matrix  for  5P00: 


I 

Regression  estimated  category 

i 

I 

II     III      IV        V 

2 

2 

174 

273 

70 

521 

>1 
u 

0 

II 

8 

2 

225 

293 

80 

608 

4 

5 

133 

231 

74 

447 

0) 

2 

1 

99 

165 

60 

327 

(T! 

U 

III 

3 

2 

110 

323 

150 

588 

> 
u 

1 

2 

105 

239 

197 

544 

Q) 

111 
.0 

1 

0 

58 

299 

340 

698 

o 

IV 

0 

1 

48 

234 

408 

691 

0 

0 

54 

455 

1316 

1825 

V 

1 

0 

39 

448 

2009 

2557 

10 

9 

529 

1581 

1950 

rn/~\rn7\  T 

12 

6 

516 

1379 

2819 

iU  lrtl 
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2.   Verification  Matrix  for  5P24: 


Regression  estimated  category 

I 

II     III      IV        V 

21 

21 

111 

331 

57 

541 

I 

>1 

u 

0 
0) 

II 

13 

4 

129 

337 

97 

580 

13 
6 

10 
5 

91 
58 

269 

174 

81 
68 

464 
311 

u 

14 

4 

64 

305 

201 

588 

ntl 

III 

d) 
> 

u 
w 

3 

6 

54 

231 

226 

520 

2 

4 

34 

260 

398 

698 

J3 

IV 

o 

0 

4 

33 

198 

436 

671 

3 

0 

31 

410 

1360 

1804 

V 

3 

3 

44 

350 

2088 

2488 

53 

39 

331 

1575 

2097 

TOTALS 

25 

22 

318 

1290 

2915 
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3.   Verification  Matrix  for  5P48: 


Regression  estimated  category 

I 

II 

III     IV        V 

2 

2 

70 

365 

127 

566 

I 

>1 
u 

II 

0 

0 

40 

336 

171 

547 

1 

2 

34 

•  286 

143 

466 

0 

0 

0 

24 

147 

131 

302 

-P 
U 

III 

0 

1 

33 

269 

295 

598 

> 
d) 

0 

2 

14 

193 

295 

504 

0 

0 

18 

206 

461 

685 

CO 
O 

IV 

0 

0 

9 

175 

468 

652 

0 

0 

17 

298 

1472 

1787 

V 

0 

0 

14 

276 

2126 

2416 

3 

5 

172 

1424 

2498 

TOTALS 

0 

2 

101 

1127 

3191 

4.   Verification  Matrix;  Probabilistic  vs.  Categorical 

This  verification  matrix  shows  results  from  dependent 
data  for  the  probabilistic  scheme  of  Aldinger  [1979]  vs. 
the  5CAT  categorical  scheme  of  this  study.   The  upper 
values  in  each  block  are  for  the  probabilistic  scheme, 
the  lower  values  are  for  the  categorical  scheme. 


>1 
u 

0 
Cn 
<D 
-M 
(0 

u 

> 

u 
en 

X! 
O 

I 

II 

III 

IV 

V 

Regression  estimated  category 

I       II      III       IV      V 

106 
7 

275 
3 

139 
106 

113 
504 

81 
94 

714 
714 

76 
5 

275 
2 

264 
100 

198 
644 

93 
155 

906 
906 

83 
2 

284 
2 

483 
90 

461 
902 

141 
456 

1452 
1452 

77 

1 

232 

1 

380 
60 

976 
820 

246 
1029 

1911 
1911 

117 
0 

327 
1 

333 
53 

2240 
1110 

1120 
2973 

4137 
4137 

459 
15 

1393 
9 

1599 
409 

2988 
3980 

1681 
4707 

TOTAI 
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